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Abstract Graphical Markov models combine conditional independence constraints with graph¬ 
ical representations of stepwise data generating processes. The models started to be formulated 
about 40 years ago and vigorous development is ongoing. Longitudinal observational studies 
as well as intervention studies are best modelled via a subclass called regression graph models 
and, especially traceable regressions. Regression graphs include two types of undirected graph 
and directed acyclic graphs in ordered sequences of joint responses. Response components may 
correspond to discrete or continuous random variables or to both types and may depend exclu¬ 
sively on variables which have been generated earlier. These aspects are essential when causal 
hypothesis are the motivation for the planning of empirical studies. 

To turn the graphs into useful tools for tracing pathways of dependence, for understanding 
development over time and for predicting structure in alternative models, the generated distri¬ 
butions have to mimic some properties of joint Gaussian distributions. Here, relevant results 
concerning these aspects are spelled out and illustrated by examples. With regression graph 
models, it becomes feasible, for the first time, to derive structural effects of (1) ignoring some 
of the variables, of (2) selecting subpopulations via fixed levels of some other variables or of 
(3) changing the order in which the variables might get generated. Thus, the most important 
future applications of these models will aim at the best possible integration of knowledge from 
related studies. 

Keywords Composition property, Conditional dependence, Conditional Independence, Con¬ 
nector transitivity, Directed acyclic graphs, Intersection property, Partial Closure, Partial In¬ 
version, Regression graphs, Singleton transitivity, Traceable regressions, Undirected graphs. 

Some historical remarks and overview 

Graphical Markov models provide the most flexible tool for formulating, analyzing, and 
interpreting relations among many variables. The models combine and generalise three 
different concepts developed about a century ago: (1) directed graphs, in which variables 
are represented by nodes, used to study linear processes by which joint distributions 
may have been generated (Sewell Wright, [1191 1120] : [89]), (2) simplification of a joint 
distribution with the help of conditional independences (Andrei A. Markov, [56]), and 
(3) specification of associations only for variable pairs which are in some sense strongly 
related and are turned into nearest neighbors in an undirected graph (Willard Gibbs, 

H; m- 

First formulations of graphical Markov models started about 40 years ago, HBEZ1 
Eg. n, ung, several books with differing emphases have appeared since then, for 
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instance, |116j . [32], [IS], [IB], [23], [37], [37], [33], [33]. Vigorous development is ongo¬ 
ing. These multivariate statistical models combine the above simple but most power¬ 
ful notions: data generating processes in sequences of single or of joint responses and 
conditional independences and dependences captured by graphs. Arguably, the most 
outstanding feature of these types of models is that many of their implications can be 
derived using the graphs. Some of this will be outlined and illustrated here. 

The generating processes concern no longer only linear relations, as a century ago, 
but they include, among others, linear regressions, [95], generalized linear models, [58], 
[2], exponential response models, [33], [7], subclasses of structural equations for longitu¬ 
dinal studies, an. a. models for planned interventions such as controlled clinical trials 
with randomized allocation of individuals to treatments, and models for only virtual 
interventions, [STj, [66] . [31] . In particular, response variables may in general be vector 
variables that contain discrete or continuous variables or both types as components. 

We concentrate here on ordered series of regressions for which the responses have 
as regressors exclusively variables, which have been generated earlier, so that they are 
in the past of the response. Throughout, we use the terms regression and conditional 
distribution interchangeably. The generated distributions are called traceable regres¬ 
sions, PH, when different pathways of development can be traced in a corresponding 
graph, called their regression graph, PEI- Regression graphs extend graphs for multi¬ 
variate regression, S3, which are one of four different types of the so-called chain graphs 
introduced in the literature, ra, S3, ra, 0- 

Each such graph may represent a research hypothesis on how data could have been 
generated, [ 108] so that we speak of the starting or the ‘generating graph’. When one 
starts with such a general type of graph, one ordering of the joint responses is taken 
as fixed and the properties of regression graphs, stated here in Propositions [3] and flOl 
assure that their graphical structures have an interpretation in terms of probability 
distributions. 

Often the objective is to uncover graphical representations that lead to an under¬ 
standing of the generating process for appropriately collected data. Then for each such 
study, the starting point is the available substantive knowledge. It is used to decide on 
variables that are relevant in a given context and on their ordering into responses, inter¬ 
mediate and explanatory variables. Explanatory variables or regressors may for instance 
be treatments, intermediate outcomes, risks or variables available at baseline, that is at 
the start of the study. The last are named context variables since they capture features 
that are taken as given, of the study or of the study individuals. 

Well-fitting graphs are derived by using a combination of information from the study 
design, from statistical analyses that are used to decide on conditional dependences and 
independences, from past empirical evidence and from theoretically postulated relations. 
For detailed analyses in some studies, see m M; links to further sizeable empirical 
studies are in an overview, HH. 

In the following, we do not discuss fitting- or model search-procedures in detail. 
Instead, we describe first models and graphs for few variables; especially graphs that 
are fully directed or that are undirected, becuase they had been developed first and are 
now still intensively studied, mainly in the context of Bayesian inference or in computer 
science. We then proceed to regression graphs and models, to special binary distribu¬ 
tions, to a summary and some open problems. The main purpose here is to introduce 
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concepts, especially the interplay between generating processes, graphs, factorizations 
of densities, edge matrices and matrix operators to partially modify graphs or matrices. 
Simple examples illustrate some of the now available, unifying results. 

Directed acyclic graphs and three Vs 

We start by introducing some terms commonly used for graphs in order to discuss 
the three key situations for directed graphs. A ‘graph’ consists of a node set, N = 
{1,..., d}, and one or more edge sets. Nodes are also called vertices. Two distinct nodes 
are said to be ‘coupled’, or to be adjacent, if they are directly linked in the graph. Such 
a link is named an ‘edge’. A ‘simple graph’ has at most one edge for each node pair 
and has no node linked to itself. A graph is ‘complete’ if all its node pairs are coupled. 

A sequence of edges connecting distinct nodes is a ‘path’. By convention, the shortest 
type of path is an edge. A ‘directed graph’ has exclusively arrows as edges; it is 
‘acyclic’ if it is impossible to return to any starting node by following a ‘direction¬ 
preserving path’ that is a sequence of arrows pointing in the same direction. Directed 
acyclic graphs are simple graphs and each zj-arrow, zH— j, points from a regressor node 
j to its response node z; or are said to point from a parent j to its child z. We shorten 
the name ‘subgraph induced by a set of nodes’, to ‘subgraph of nodes’, which just 
keeps those nodes and the edges present among them in a given graph. 



Figure 1: The three types of V in directed acyclic graphs; left: source 1/, middle: tran¬ 
sition V, right: sink V, called in the literature also a collision V, an unshielded collider 
or unmarried parents having a common child. 

FigJTj shows the possible three types of V in directed acyclic graphs. A subgraph of 
three nodes is called ‘a V’ if it has two edges. In each V, there are two ‘outer nodes’ 
that are both coupled to one common neighbor, the ‘inner node’ of the V. The name 
of a V stems from its type of inner node. In a self-explanatory way, the Vs in Fig{Tjare 
called, a ‘source V’ on the left, a ‘transition V’ in the middle and a ‘sink V’ on the 
right.The notion of inner nodes extends to zj-paths i more than three nodes. 

For just three variables and in a condensed notation for the generated probability 
density functions, the factorizations corresponding to Fig{T]are 

/l23 = flfiflfifzi /l23 = / 1 I 2 / 2 I 3 / 3 , /l23 = / 1 I 23 / 2 / 3 - 

The implied constraints are conditional independence of the outer node pair given the 
inner node, both on the left and in the middle, and marginal independence of the outer 
node pair, on the right. In the notation introduced by Dawid, [2D], one writes these 
constraints equivalently as 


(/i|23 = /i| 3 ) ^ 1 -LL 2|3, CA 123 = /i| 2 ) ^ 1_LL3|2, (/ 23 = / 2 / 3 ) ^213, 


again in a condensed notation in which each node denotes also a variable. 
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Only the generating process in the middle of Figjl] specifies a full ordering of all 
three variables as (1,2,3), while one cannot distinguish with the graph alone between 
(1,2,3) and (2,1,3) for the source V and between (1,2,3) and (1,3,2) for the sink V. 
More generally, a directed acyclic graph may be ‘compatible with several orderings’ 
of the variables such that the set of all independences, that is the ‘independences 
structure’ of a graph, remains unchanged. This poses problems for some machine¬ 
learning strategies. In many applications however, one compatible ordering can be taken 
as fixed; substantive knowledge may even give a full ordering of all variables. 

Parent graphs and three Vs 

A graph is said to form a ‘dependence base’ if a full ordering of the nodes is fixed and 
each edge present in the graph means the lack of a conditional independence, typically 
a dependence that is considered to be strong in a given context. General properties of 
the graphs are also used. For regression graphs, these are stated here in Propositions [9] 
and GUI Directed acyclic graphs that form a dependence base have been named ‘parent 
graphs’, [55], denoted by G^ r . Their defining pairwise relations are in equation (d]). 

For each node i in the ordered node set, N = (1,..., d), of a parent graph, one knows 
which nodes are in ‘the past of node i\ that is in set {> i} = (z +1,..., d). The subset 
of nodes in {> i} from which arrows start and point to node i is the set of ‘parents of 
node i\ denoted by par^. In G^ r , we have a dependence of each node i on all nodes in 
par^ and independence of i on all other nodes in the past of i. Expressed by using the 
fh-notation introduced for non-vanishing dependences by Wermuth and Sadeghi. ma, 
we have for j > i in : 

i fh j|parj \ {j} for j £ par^ and 'iJLj|par, ; for j £ {> i} \ par^ (1) 

As mentioned before, one outstanding feature of a graphical Markov model is that its 
consequences can be derived, for instance for marginal or for conditional distributions. 
To illustrate this first for the graphs in Fig{l] we use a special notation. A ‘boxed- 
in node’, [O] . indicates conditioning on the levels of the variable at this node, and a 
‘crossed-out’ node, 1$, means marginalizing over the variable, ms- 

As justified later, we take sink Vs in Gp ar to be edge-inducing by conditioning and 
the source and transition Vs, to be edge-inducing by marginalizing; each of the Vs of 
Fig. I] introduces a different type of edge. The Vs with the edge-inducing operation on 
the inner node is shown in the following line and the induced edges in the line thereafter. 

%-<. j, i—>- [O] —j (2) 

i—j, i - j- 

The induced edges ‘remember at first’ the type of path ends at i,j of the generating V , 

but then each <—y is replaced by-, because no direction is implied after ignoring a 

common source and, as explained below, the two types of undirected dependence can 
be readily distinguished. 

The following example is derived from information on a social survey, |93j . It shows 
how conditioning on the inner node of a sink V induces a conditional dependence. To 
distinguish underlying continuous variables from discrete ones. The former are drawn 
with a circle, the latter with a dot. 
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Y, income, American 
banks, about 1980 

n 


X, years of formal 
J^^-O schooling 

A, gender 


In American banks in the 1980s, salaries, Y, increased with higher levels of formal 
education, A", for both women and men, that is Y iti X\A, with A denoting gender. 
Men received a clearly higher salary than women at given levels of A", so that Y iti 
A\X. Furthermore, men and women had equal chances to obtain higher levels of formal 
education, A" HA This implies for A" rtl A\Y: for any given level of the salaries, women 
had a higher level of formal education than men. 

We show in the next section how the above edge-inducing rules mimic the effects 
of marginalizing and conditioning in non-degenerate Gaussian distributions, those that 
have invertible covariance matrices. 


Gaussian distributions generated over parent graphs 

For linear relations in d mean-centered variables X tl a non-degenerate Gaussian distri¬ 
bution is generated with 

AX = e, E(e) = 0, cov(£) = A diagonal, (3) 

where zero-mean, uncorrelated Gaussian residuals, e*, have positive variances cr^|>j and 
are in the d x 1 vector e. Vector X contains the variables A ), and matrix A is ‘unit 
upper-triangular, that is it has ones along the diagonal and zeros below the diagonal. 
In row i, it has minus the values of linear regression coefficients resulting with response 
Aj regressed on X >t , [93], [98] . 

In the early literature of econometrics, such linear relations have been discussed as 
recursive equations, HE] and were written in triangular form; for three variables as: 


X\ + Gq 2 AG + ai 3 AG = e 1 , 

AG + @23X3 = £2 , 

X3 = £3. 

Note that in a Gaussian distribution generated over a complete parent graph, none of 
the regression coefficients vanishes when each response X, is regressed on all variables 
in its past, that is on X >t . 

By equation (1), missing edges in the starting graph define the ‘independence con¬ 
straints’. For Gaussian distributions generated over parent graphs, these are reflected in 
vanishing regression coefficients and as zeros in ‘matrices of equation parameters’. 
For example, in the first and third generated distribution of Fig{l] we can write: 

/l 0 di 3 \ /X A /e A /l <Zi2 di 3 \ (X A f £l \ 

I 0 1 a 2 3 1 I X 2 1 = I £2 1 , JO 1 0 I | X 2 ) = J £2 I 

\0 0 1 / \X 3 J \e 3 J \0 0 1 / W W 

while for the second case in FigJT] a 12 and a 2 3 are nonzero but a 13 = 0. 

For an explicit distinction between conditional and marginal dependences, we switch 
to a more detailed notation for trivariate Gaussian distributions. For instance, /3 1 | 3 . 2 = 
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—di 3 is the coefficient of X 3 in the linear regression of A"! on X 2 and X 3 , while /5 2 |3 = —023 
is the coefficient of A 3 in the linear regression of X 2 on A 3 alone. 

The following relation between marginal and conditional linear least-squares regres¬ 
sion coefficients, due to William Cochran, [13], is called the recursion relation of these 
regression coefficients: 

/?1|3 = /?1|3.2 + Al|2.3$2|3 • (4) 

Thus, for a Gaussian distribution generated over the parent graph in the middle of 
FigJU which is a transition V, the conditional independence 1 _LL312, (/ 3 i | 3 .2 = 0), implies 
the marginal dependence 1 rh 3, (/?i | 3 ^ 0), because the edges present in the transition 
V mean /3i | 2 . 3 ^ 0 and / 3 2 |3 A 0- This property is shared by trivariate binary distributions, 
[SI]. Joint distributions with this property in its generalized form, given here in equation 
(13U|) . are said to be dependence inducing, mm, or to satisfy singleton transitivity, m- 
For a Gaussian distribution generated over a of FigJTjon the right, which is a sink 
V, the marginal independence 2 _LL 3 implies the conditional dependence 2 rh 311. These 
features may best be recognized with equation ([5]) below, after introducing correlations 
and their relations to other types of parameter. 

With the covariance matrix denoted by X and its inverse, the concentration matrix, 
by X -1 , we write explicitly 

Ax 11 a 12 a 13 
X " 1 = . a 22 a 23 

\ . .a 33 

The .-notation indicates symmetric entries, the diagonal elements of X are the ‘vari¬ 
ances’, an = E(X 2 ). and the off-diagonal elements are the ‘covariances’, a t] = 
E(XiXj), of the mean-centered Xi,Xj. The diagonal elements of X -1 are the ‘pre¬ 
cisions’, a n , the off-diagonal elements are the ‘concentrations’, a l E 

The ‘correlation coefficient’, p 2 3 , and the ‘partial correlation coefficient’, p 2 3 |i? 
relate to the other parameters and to each other via 

P23 = ^23/V0220- 33 , P23|l = ~ G 23 1 \/G 22 O 33 = (>23 ~ PviPlTl) /\J (1 “ P? 2 )(l “ P 13 ), 

/3 2|3 = 023/033 = -a 23 - 1 /a 22 - 1 , /5i| 3 .2 =o-i 3 | 2 /cr 33 |2 = -a 13 /a 11 . 

In this notation, a 231 is the concentration of (2, 3) after marginalizing over X\ and cri 3 |2 
is the covariance of (1, 3) conditionally given X 2 = x 2 . 

Correlations are best suited to reflect the strength of linear dependences, here those 
induced by the independence constraints. With 2 _LL 3 and with 1 _LL 3 | 2, the induced 
conditional and marginal dependences are, respectively, 

P23|l* = _ Pl2|3Pl3|2 , Pl3* = P12P23 ■ (5) 

Thus, the induced linear dependence can be considerably stronger for a marginal than 
for a conditional independence. For instance with 2 _LL3, there is — /? 23 |i* > 0.96 if p\ 2 = 
Pi 3 = 0.7 and X -1 does not exist if p 32 = pi 3 > \/ffi5. By contrast if p\ 2 = p 23 = 0.7 
and 1 _LL312, the induced marginal correlation is only p 13 * = 0.49. 



X = 
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Some properties of Gaussian distributions 

There are recursions also for concentrations, [23], and for covariances, [3]: 

a- 23 - 1 = a 23 - a 12 a 13 / a 11 , a 13 \ 2 = cr 13 - cr 12 a 23 /a 22 • (6) 

The first recursion shows that 0 = a 231 = cr 23 , that is both of (2_LL3 and (1_LL2|3) hold, 
if (a 12 = 0 or cr 13 = 0) in addition. Similarly, the second recursion shows that both 
of (1 _LL 3 1 2 and 1 _LL 3) hold if (<7 i 2 = 0 or cr 2 3 = 0) in addition. Thus, an independence 
statement involving the third variable is needed for a variable pair to be both marginally 
and conditionally independent. This is the simplest case of inducing dependences, that 
is of ‘singleton transitivity’; see [100 ] and here equation (1301) . 

Recursion relations such as in equations (J4|) and ([6]) and their connection to the 
elements of the above matrices A show also that in trivariate Gaussian distributions 
‘conditional independences combine downwards’ as: 

(1X2 | 3 and 1X3 | 2) =»{1X(2,3) /i 23 = /i/ 23 } =>>(1X2 and 1X3), 

that is they satisfy what is also called the ‘intersection property’. Furthermore, in 
these distributions ‘conditional independences combine upwards’ as: 

(2X3 and 1X3) =>{3X(1,2) f l23 = / 12 / 3 }=>(2X3 | land 1X3 | 2), 

that is they satisfy what is also called the ‘composition property’. 

In the information theory literature, non-degenerate Gaussian distributions have 
been characterized by the above properties in terms of graphoids; these structures satisfy 
the properties common to all probability distributions plus intersection, [67], [87] : 

Proposition 1 Lnenicka and Matus, [50] . Gaussian distributions are singleton-transitive, 
compositional graphoids. 

To make graphs useful tools for empirical studies, the distributions generated over 
dependence-base graphs have to share the properties of Prop.l and are then called 
‘traceable regressions’, [100] : their graphs can be used to trace developmental path¬ 
ways; see Example 2, given later. 

Families of discrete distributions which violate singleton transitivity, the intersection 
or the composition property require very special types of parametrizations, mu]. For 
the combination of independence statements of regression graphs, the intersection and 
the composition property are always used, |S]. These two properties also hold in dis¬ 
tributions generated over parent graphs; see [55] . discussion of Lemma 1, provided the 
ordering and the dependences are indeed as given with equation ([I]). 

The relations between linear parameters, discussed above, generalize to more than 
three variables, but switching to a matrix notation and to edge matrix representations of 
graphs becomes useful for discussing most independence properties in general; for joint 
Gaussian distributions, see for instance [55], Appendix 2. Here, we start again with the 
simplest type of edge matrices, those to the graphs of FigJU 
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Structural versus parametric implications 

An edge matrix A can be viewed as the sum of an identity matrix, I, and what has been 
named the adjacency matrix in graph theory; a square binary matrix with an if-one if 
there is a directed edge in the graph and an additional ji-one for an undirected edge. 
The small change of adding I leads to well-defined matrix products which can be used 
to derive structural consequences of a given generating graph. As we shall see, such 
structural consequences may differ from those of a given generating set of parameters. 

The edge matrices A in Table 1 share the unit upper-triangular form with the linear 
equation parameter matrices A given with the generating equations (J3jl. 


Table 1: Edge matrices for the three Vs of Fig jTj 


Edge matrices A of lH—3— y2 


lH—2 h—3 2——3 


A: 





For instance with M T denoting the transpose of a matrix AT, Gaussian systems, 
AX = e of equation (|3|) . imply as covariance and concentration matrices, 

E = A- 1 A(A" 1 ) T , S~ 1 = A t A' 1 A, (7) 

where the matrix pairs (A, A -1 ) and (A -1 , A) are ordered Cholesky decompositions or 
‘triangular decompositions’ of E _1 and E, respectively, |98j . 

For Gaussian distributions, zero elements in E and E” 1 coincide with those in¬ 
dependences that hold, more generally, in covariance and concentration graphs, 
respectively, of other types of distribution: 

(c m = 0) iJLj , (a* J = 0) & (iJLj \ N \ {i, j}). 

For the dependence base Vs of Fig{H it may be checked directly that with An = 
OTlpar,;> 0, a nonzero element is induced in different positions in row one of E for the 
source V and for the transition V, while a nonzero element is induced in position (2,3) 
of E” 1 for the sink V; see also equation fl5J). 

In general, implications of a graph result via transformations of edge matrices. The 
edge matrix A of G^ r , for node set N of size d, is the d x d unit upper-triangular matrix 
A = (,An) such that 

_ J 1 if and only if i^—j in G^ ar or i = j, 

10 otherwise. 

For path interpretations, a definition of node j being an ‘ancestor’ of ‘descendant’ 
i is needed: there starts a direction-preserving path at j leading to node i. We will now 
derive the edge matrix transformation that turns every ancestor in into a parent. 












The k'th power of the adjacency matrix (A — I) is known to count for each i < j 
in Gp. u . the number of direction-preserving paths of length k connecting nodes i and j. 
Since the longest of such paths has d — 1 edges, zero matrices (A — I) k result for all 
k > d — 1. Thus, the edge matrix of the ancestor graph of G^ r , denoted by A~, becomes 

A~ = In[(2I - A)- 1 }, (21 - Ay 1 = I + (A - I) + (A - I) 2 + ... + (A - I) {d ~ 1} , 

where ‘In’ is the indicator function that replaces every positive entry of a nonnegative 
matrix by a one. The above sum is the matrix analogue to the sum of an infinite 
geometric series, where for | a |< 1, one obtains (1 — a) -1 = 1 + a + a 2 + ..., ( |6Tj , p. 29, 
[53]). This is generalized here in equation (TT3|) . The edge matrix analogue to equation 
(ED is introduced next. 

With the edge matrices A and A~ , the consequences of the starting graph, G^ r , for 
pairwise marginal and for conditional independences given all remaining variables, can 
be directly given. An implied independence i_lLj and i_LLj | N\ respectively, is 

indicated by a zero in positions (i,j) of 

Af NN = ln[A-(A-) T ], AT NN = ln[A T A], (9) 

that is in the edge matrices of the ‘overall covariance and concentration graph 

induced by G^ r ’; see also equation (ED and the next section. 

Such zeros are said to be ‘structurally induced’ because they result for all dis¬ 
tributions that factorize as prescribed by a given generating graph. With the examples 
in Fig. [2] and Fig. [3] in the next section, the types of path are identified which induce 
more edges than there are present in a starting parent graph and therefore lead to more 
complex structures, captured in one or both of the two induced undirected graphs. 

These edge-inducing paths introduce an additional dependence in Gaussian distri¬ 
butions generated over parent graphs, provided no other constraints apply than the 
pairwise independences and dependences defining their generating graph; see equation 
(JTJ) so that contributions of several paths may get cancelled. 

If an independence is not structurally induced, then it may still get generated by 
particular constellations of the parameters. Such cases have been called ‘parametric 
cancellation’, [lOlj or ‘lack of faithfulness to the graph’, [81]. For instance, a 
parametric cancellation occurs if in equation (EJ, one has /3i| 3 . 2 = —/3i|2.3/32|3- This leads 
to a zero in position (1, 3) of E even when 1 fti 213 and 2 (ti 3 and hence to a non-structural 
independence, 1 _LL 3, for Gaussian distributions. 

Some consequences of a five-node parent graph 

For five ordered nodes, N = (1,..., 5), Fig ]2] shows a parent graph, G^ r , which contains 
the three types of V of FigEl Edges present and edges missing are defined by equation 
(ED. The factorization of f N can be read directly off the graph: 

/v = / 1123 / 2 / 315 / 415 / 5 - 

Also, the graph can be drawn using the given order of the nodes and this factorization. 

To generate the joint distribution over G^ r ,one starts with / 5 , generates / 4 | 5 next, 
then / 3 | 5 , then / 2 and finally /i| 23 . The defining pairwise dependences in equation (ED 
give 

1 1+1 {2,3} , 3 rh 5, 4 rti 5, 
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generating parent graph 


induced concentration graph 




Figure 2: left: a small with the three types of V, with source node 5, transition node 
3 and sink node 1; right: the induced concentration graph with an additional edge for 
(2,3) due to conditioning on the common sink node 1 of (2,3). 

so that no simpler factorization holds in distributions generated over this parent graph. 
From the pairwise independences in equation (JT|) or from the factorization of /jv, one 
obtains the defining independence structure of as: 

1X{4,5} | {2,3}, 2_LL{3,4, 5} , 3X4 | 5. 

All further implied independences may, in principle, be derived directly from such 
a list of independences by using the properties of the starting graph. We turn to these 
properties in Prop. [9j [TOl Similarly, further implied dependences can be obtained by 
using the factorization of /jy and the information that the factorization cannot be further 
simplified. But, one may instead use the edge-inducing properties, [68], of Vs in parent 
graphs, extending the discussion above for three-node graphs. Proofs may be based on 
Prop. SI given later. We start with consequences of the sink V in FigJ2] 

It can be derived that for every sink V with outer nodes i,j, all independence state¬ 
ments for i and j implied by G(f ar exclude the inner sink node ‘o’. Here we have for in¬ 
stance, 2X3, 2X3|4 and 2X3|{4,5} so that there are several subsets c of fV\{i,o, j} 
for which iX j\c is implied by the parent graph, here e.g. c = 0, c = {4}, c = {4, 5}. 
Thus, given a sink V, there are c C N\{i,o,j} such that iX A;|c is implied by the graph. 
For each such c, 

nodes (i,o,j) forming a sink V in G^ av (iXj|c =>■ i iTi j|oc). (10) 

In Fig. [21 for instance, 2 rti 311, 2 iti 31 {1, 4} and 2 iTi 31 {1,4, 5} are induced. For Gaussian 
distributions, the size of such dependences can be expressed in terms of induced partial 
correlations, in a similar way as in equation (J5]) . 

The concentration graph induced by G(f ar , involves conditioning on all nodes. The 
additional edges result by closing sink Vs, as captured by Af NN in equation ([2]). More 
edges represent in general a more complex structure and in cases with complete, undi¬ 
rected subgraphs of three or more nodes, it cannot be recognized from a concentration 
graph alone which edges are due to conditioning on sink Vs in G 

We turn next to consequences of transition and source Vs by using Fig]3] It can be 
derived that for every transition V with outer nodes i,j, all independence statements 
for i and j implied by Gy.), include the inner node ‘o’;. In Fig. El we have for instance 
1X5|3, 1X5|{2, 3}, so that there are several subsets c of N\ { i , o,j} for which iALj\oc 
is implied by the parent graph, here such as c = 0, c = {2}. 
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generating parent graph 


ancestor graph induced covariance graph 



Figure 3: left: the same generating G^ ar as in Fig[ JJ- middle: the corresponding ancestor 
graph, also called the transitive closure of G^; right: the induced covariance graph with 
new edges for (1,4), (1,5), (3,4) compared to G^ av . 

Thus, given a transition V, there are c C N \ {i,o, j} such that iALj\oc is implied 
by the graph. For each such c, 

nodes (i,o,j) forming a transition V in G^ ar <=>(iALj\oc =>• i rh j\c) . (11) 

In Fig. [3] for instance, 1 rh 5 and 1 rh 512 are induced. A fully analogous statement 
results by replacing in the previous paragraphs each time ‘transition node’ by ‘source 
node’. Just the examples relating to Fig. 0 change. 

The edge matrix A/~nn in equation ((9]) shows that by moving from the ancestor graph 
to the induced covariance graph, every source V in the former is closed by an edge. Then, 
in the overall covariance graph induced by , there is an additional i j-edge if either 
j is an ancestor of i or i and j have a common ancestor. 

Unless G ^ ar contains exclusively sink Vs, there will be more edges in the induced 
covariance graph. And again, whenever three or more nodes are contained in some of its 
complete subgraphs, it is impossible to see from the induced graph alone, whether ad¬ 
ditional dependences have been generated. Therefore, the same type of general warning 
as above applies to using the class of covariance graphs for model selection. But further¬ 
more, when a learning strategy is based on only the relations among variable pairs, no 
joint distribution may exist for such a given set of two-way margins that results from 
joint distributions with higher-order interactions; for an example see [TOO] . 

To summarize, with information on the ordering of the variables, simpler structures 
will typically be uncovered, unless no additional edges are introduced, so that, say, a 
starting G^ ar and a concentration graph have the same edge and node sets but different 
types of edge. In such important special situations, there is ‘Markov equivalence’, 
that is when the same independence structure is captured by two different graphs; see 
Prop. [5] below. 

Undirected generating graphs 

Suppose now that variables are unordered, that is arising at the same time, like several 
symptoms of a disease or local consequences of a global economic shock. Their joint 
distribution could then have a generating concentration graph, G^ on , or a generating 
covariance graph, G aov . The defining pairwise independences for G^, n are i _LLjjA^\ {i,j} 
and those for G^ ov are i _LL j. For dependence base undirected graphs, each *j-edge present 
means: 

i - j-&i(hj\N\{i,j}inG% m and i-—j O i rh j in . 
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To read all implied independences off their graphs, a standard separation criterion 
from graph theory can be applied. For this, one says ‘a path intersects a subset’ 
g of node set N if it has an inner node in g. We let next {a, /3,c, m} partition node 
set N, where only more may be empty sets. This notation is to remind one that with 
any independence statement a3Lf3\c, one implicitly has marginalised over the remaining 
nodes in m — N \ {a U /3 U c}; one considers the joint distribution of Y a , Yq given Y c . 

Proposition 2 Darroch et al., pj2|j. A generating concentration graph, G^ on , implies 
a3Lf3\c if every path between a and/3 intersects c. 

Proposition 3 Kauermann, [32] ■ A generating covariance graph, G^ ov , implies a _IL /3\c 
if every path between a and f3 intersects m. 

Whenever these undirected generating graphs also form dependence bases, the con¬ 
verse holds as well: if a node in a is connected to one in (3 by a path that does not 
intersect c in G^ n or that does not intersect m in G^ ov , then a iti f3 \ c is implied. Some 
additional effects of Prop. [Tj and [2] result by considering ‘a-line ij-paths’: those which 
connect node pair i,j and have all inner nodes in a C N. 

Corollary 1 By marginalizing over any subset a of N in G^ on , all a-line paths are closed 
while by conditioning on a, its subgraph of N\a is induced for N \a. By conditioning on 
subset a in G^ ov , all a-line paths are closed while by marginalizing over a, its subgraph 
of N \ a is induced for N \ a. 

Prop. Q] and [2] imply more for ‘connected graphs’, that is when the nodes of every 
node pair can be reached via some path. 

Corollary 2 A connected G^ on induces for node set N a complete covariance graph and 
a connected G^ ov induces for N a complete concentration graph. 

To summarize, in G^ on , each full-line V is edge-inducing by marginalizing and in 
G^ ov , each dashed-line V is edge-inducing by conditioning, where we take again the 
induced edges to remember the edge-ends of the starting V. Note again that for Gaussian 
distributions generated over a dependence base G^ on or G^ ov , an induced edge coincides 
always with an induced dependence: 

i - ft - j, i — (12) 

i -j, i—j- 

The edge matrix of a complete generating graph of G^ ov or G^ on is a d x d matrix 
of ones. It has d — 1 zero eigenvalues and one eigenvalue equal to d. Hence, it is not 
invertible, but by subtracting it from a (d+l) multiple of an identity matrix, one obtains 
a well-posed inversion task, [90]. In the statistical literature, this type of Tikhonov 
regularization was introduced some fifteen years later in the form of ridge regression; 
a seemingly ill-posed problem is solved by increasing the diagonal elements of the matrix. 

If we denote by W any of the symmetric edge matrices of a generating G^ n or G^ ov , 
then the corresponding edge matrices induced for the covariance or the concentration 
graph are of the type: 

W" = In[{(d+l)Z- W}" 1 ]. (13) 
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By definition, the matrix (d + 1)X — W preserves the zero pattern of a given edge 
matrix W and it is a M-matrix, so that its inverse is nonnegative. The concept of a 
M(inkowski)-matrix was introduced and studied by Ostrowski, [63] [64] without any 
applications concerning graphs or statistics; it is an invertible matrix with exclusively 
nonpositive off-diagonal elements. For undirected generating graphs, the M-matrix in 
equation (11311 turns each connected component into a complete subgraph. 

Figs. [2] and [3] above illustrate, in particular, that a generating, undirected graph 
is typically different from a corresponding induced graph. The latter summarizes all 
independences of a defined type implied by, say, a starting Gp. ir . ft can, in general, 
not be used to derive further implied independences of another type; exceptions are 
discussed here later. The concentration graph and the covariance graph induced by the 
Gp ai in Figs. Eland[3] are both incomplete, connected graphs. If they were also generating 
graphs, this would, by Corollary [2] or by equation (fT3]h give a contradiction. 

In spite of the similarities of the two types of undirected graph, estimation of covari¬ 
ance graph structures, m, iM, m, in, m , ra, is typically much more complex 
than estimation of concentration graph structures, [23] . [ 33] . [S3] . | ljj . [50]). The latter 
but not the former have, for instance, reduced sets of minimal sufficient statistics, [B], 
D3. in exponential families with independences constraints, and for Gaussian distri¬ 
butions, there is a unique maximum of the likelihood function whenever there are less 
variables than observations (for l p < n’), [23] . 

Regression graphs 

Regression graphs are simple graphs with response nodes in a set u and context nodes 
in a set v such that for an ‘ordered split’ of the node set as N — (u,v), the density of 
the response vector, X u , is considered conditionally given the context variables in vector 
X, u and the joint density factorizes as 

fN = fu\vfv • (14) 

Furthermore, the response set u has an ordered partition into connected components as 
u = (< 7 i, ... ,g k ,..., g K ) so that for all nodes in the subgraph of g k , the nodes in their 
past are in g >k = {g k+u ..., g K , v} and 

fu\v = n*=i fgk\g>k ■ (15) 

Simplifying conditional independences are captured by the ‘regression graph ’. 
This simple graph uses Definition 1 and consists of a concentration graph for the context 
nodes, a conditional covariance graph for each of the ‘concurrent responses’, that is 
for X k within each connected component g k , and a directed acyclic graph in the vector 
variables (X 1; .... X K , X v ). 

The set-up for a regression graph model starts with the response-vector variable 
X\ of primary interest, possibly followed by one of secondary interest and ends with a 
context-vector variable X v : for an example see Fig. [4j 
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primary 
responses 


X, 


Xo 


treatment intermediate treatment 
variables variables variables 


context 

variables 


Figure 4: A typical ordering of vector variables for a regression graph model. 


Intermediate variables form a sequence of variables between X\ and X v . 

We let Gf cg form a dependence base, so that edges present mean non-vanishing de¬ 
pendences; one ordering of the nodes is fixed so that it is compatible with a known or 
hypothesized generating process. 

Definition 1 Wermuth and Sadeghi, [ 1T2] . An ij-edge present in Gf eg means 

i - j with i,j in g k : i rtl j\g >k , 

— j with i in g k and j in g >k : i rh j\g >k \{j} , 

i - j with i, j in v: i rh j\v \ {i 1 j}, 

while for uncoupled pairs the dependence sign rh is replaced by the independence 

sign _IL, but the conditioning sets remain unchanged. 

There are equivalent pairwise properties of Gf cg , important for interpretation, such 
as z_LLj|par { ; for uncoupled (i, j) with i £ g k and j £ g >k . 

A distribution is said to be ’generated over a regression graph’ when it satisfies 
the factorizations of equations (JHJ), dT5]l while independences as well as dependences 
are specified by Definition 1 for a given node ordering N = (1,..., d). Note that with 
Definition 1, Gf eg is unchanged for a reordering of the nodes within any response set g k . 

In a regression graph, three additional Vs may occur compared to those in a parent 
graph, see equation ([2]), and in the two types of undirected graph, see equation (fT2lb 

$—j, P - J, i— O j (16) 

i—j, 3, i^—j ■ 

With Definition 1 and a fixed compatible ordering of the nodes, three edge sets 

of different types are given for Gf gl in a self-explanatory notation, as A__, E^_, E _ 

Their union defines one edge set E. Three different Vs are edge-inducing by conditioning 
on the inner node, see the last V on the right-hand side of equations (fTfilb ffl2]) . (J2J). These 
are the ‘collision Vs ’, the other five possible types of a Gf eg are the ‘transmitting Vs ’. 
Accordingly, the inner nodes are ’collision nodes’ or ‘transmitting nodes’. 

One justification for the types of induced edge stems from the construction of sum¬ 
mary graphs, one class of ‘independence-preserving’ graphs, those which preserve 
all independences implied by a generating G^ r or G^ eg in a smaller graph obtained after 
marginalizing, conditioning and removing nodes as well as their edges, [99], [74], [76]. In 
particular, the above different types of Vs , plus two more that are used in constructing 
summary graphs, can be combined in any order in a consistent way, [92] Appendix. 

There are other classes of independence-preserving graphs, not described here, by 
which additional implications of a generating graph may be derived from a smaller 
graph. These may have different types of edge, m, m, m , serve different purposes 
but define the same independence structures. 
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To derive independences implied by Gf cg , further concepts are useful. The notion of 
anteriors of a response i, ra. extends the one of ancestors in G^ ar to Gf e . For N = (u, v) 
and iH— j, node i in gk is within u, but the parent node j can be any node in #>*,, the 
past of i. ‘Anterior paths’ join paths among context nodes in v with an arrow to 
descendant-ancestor paths in u: 

ancestors of i 

, - A -V 

i H— OH—O,..., OH— d u -1-O, ..., o- d v . 

' -V-' 

anteriors of % 

Recall that an a-line path connects a node pair by a path with all inner nodes in a 
subset a of N. With this, the notion of an ancestor graph of Gp ar can be extended. 

Definition 2 An a-line anterior graph of Gf eg has edgei^—j for every a-line ante¬ 
rior j of i in Gf cg and a-line paths for context nodes in v are closed. 

This graph permits to express the effects of separation in Gf cg , |731,|731 , in a way compa¬ 
rable to those in undirected graphs, see Prop. [2] and [31 Again, let N = (a, b), a = {a, m} 
and b = {/3, c}, where only m or c may be empty. 

Proposition 4 Wermuth and Sadeghi, m- A regression graph implies o_lL/3|c if 
along every path between a and (3, in the a-line anterior graph of Gf cg , a collision node 
intersects m or a transmitting node intersects c. 

The converse of Prop. [4] holds with Definition 1, and a fixed compatible ordering of the 
nodes. Prop. [4] specializes to the effects of separation in directed acyclic graphs; see [55] . 
Criterion 1, also for a proof of equivalence to other path criteria for the independence 
implications of the , [65], [33], [45] , 

Corollary 3 A path between a and f3 in the a-line anterior graph of Gf eg is edge- 
inducing if every collision node is in c and every other node is in m. 

Corollary 4 A path between a and fd in the a-line ancestor graph of is edge- 
inducing if every sink node is in c and every other node is in m. 

In particular, Gf cg and G^ av induce a complete graph by marginalizing over N if, for 
node 1, the last node d is an anterior in G^ g or an ancestor in G^ ar . 

One further important question is whether two regression graphs with different types 
of edge can define the same independence structure if they have the same node set N 
and an identical edge set E = U _U E _ 

Proposition 5 Wermuth and Sadeghi, E2t Two regression graphs, with different types 
of edge but an identical node set N and an identical edge set E, are Markov equivalent 
if and only if their sets of collision Vs coincide. 

Thus for instance, a given regression graph is Markov equivalent to its induced con¬ 
centration graph if and only if Gf cg does not contain a collision V , and to its induced 
covariance graph if and only if Gf cg does not contain any transmitting V. A covariance 
and a concentration graph are Markov equivalent if and only if they consist of identical 
sets of complete subgraphs. 

Before we derive graphs induced by Gf cg , we introduce two basic types of Gaussian 
distributions that may get generated over a regression graph. 
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Two types of Gaussian regression graph model 

We note first that for mean-centered variables and N — (a, b), a ‘linear regression of 
a joint response X a on X b gives, [95], 

X a = n a | b X b + rja, E(t]o) = 0, cov(rj a , X b ) = 0, cov(r] a ) invertible. (17) 

The parameters are a matrix of population least-squares regression coefficients II a |f, and 
a residual covariance matrix £ aa p = E{rj a rjf). The interpretation of II a |& results by post 
multiplication in the first equality of equation (TT7T) with Xf and taking expectations: 
E(X a Xf)~U alb E(X b Xf)=0. 

.Joint Gaussian distributions generated over a corresponding regression graph are 
non-degenerate, have a concentration matrix Yi bb ' a for X b and zeros in the defined pa¬ 
rameter matrices are given by Definition 1 for K = 1. 

The well-known relations of these parameter matrices, ra. is] Appendix B, [55] 
Appendix 1, to £ = cov(Xat) and to £ _1 are for N = (a, b ): 

• £ J ’ 


^aa\b 

— ^aa 2j a6 2j 66 ^ba ~ V 2 ^ ) ? 


n a |5 

= 

(18) 

y \bb.a 

_ ^66 ^^aa^ — l-^ab _ y^ —1 



where the expressions for £ 66,a and £ aa |b are the matrix forms of the recursion relations 
for concentrations and covariances in equation ([ 6 ]). As is explained later, these matrix 
results can all be obtained by applying the matrix operator named partial inversion. 
The result analogous to Corollary [Tj is the following direct consequence of equation (fT 8 jh 

Corollary 5 For any subset a of N, marginalizing in £~ x over a gives £ a a|& & n d n a | b , 
while conditioning in XA 1 on a leaves the submatrix £ 66 unchanged. For b subset of N, 
conditioning in £ on b gives £ 66 ' a and n a | 6 while marginalizing in £ over b leaves the 
submatrix £ aa unchanged. 




This applies, in similar form also, for a = (a, 7 ), to Soap and with b = (/ 3,d ) to 5A i3 ' a , 
that is marginalizing in any covariance matrix leads to a submatrix and conditioning in 
any concentration matrix leads to a submatrix, while more non-vanishing parameters 
may get induced, otherwise. 

For a = (ct, 7 ) and b = (/3 , h), marginalizing over 7 and conditioning on 5 gives also a 
submatrix: IX-j^.a, where a indicates the response, /J the regressor and 8 the remaining 
regressors conditioned on: 

IIal/3.,5 H a \S.p 
n^| g.fr n 7 | 5 .^ 




Thus, by Corollary 5 and the same partition as in equation (fT9|h the parameters for 
f a \p b and fp |5 are simply submatrices of those for f a \ b and f b . 


£«a|6 — [£aa|b]a,a, II Q | p.$ ~ [n a | t,] , £ ,W '" — [Y, bb ' a ]pfi . 


( 20 ) 
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The matrices E aa | b , II a | b , E fcfe - a arise also, with I denoting an identity matrix, in 
orthogonalized equations, HE]. corresponding to equations (TTTlh 



Note that cov = s^ 1 so that this concentration matrix plays several roles. 

For a Gaussian regression graph model, recall that equations (TT4lh (fl5|i and 
Definition 1 apply. Covariance matrices after regressing X k on X >k and are in a 
block-diagonal matrix Wnn- The matrix of equation parameters, H NN , is upper block- 
triangular with identity matrices in the sizes of g k along the diagonal, E“J~ in the last 
block and off-diagonally —Yl gk \ g>k \ 

H nn X n = rj N with W NN = cov(t] N ). (22) 


As an example, we choose K = 2 and u = 

(a, t): 



^ loicx. Tlafrv 



H N n — 

-^77 

W N n = 

^77 \v 


V 0 S- 1 j 


V 0 J 


Equation (1221) implies for the single joint response regression of X u on X v 

p u\v = T, uu \ v = H-^W uu (H~^) t , 


hence with equations (ITS]) also simple matrix expressions for E uJV and Sat„. 
The edge sets of G(^ g are captured by edge matrices Hnn and Wnn- 


(23) 


Definition 3 We denote the dimension of g k by d k , the one of v by d v , so that d = 
YlkZfdk + d v for the ordered node set N — (1,..., d). The edge matrix, H = (T~Uj), is 
upper block-triangular, with K identity matrices of size dk x dk along the diagonal and 
a symmetric edge matrix for the concentration graph of X v alone in the last block. In 
the upper, off-diagonal parts are ones for arrows pointing in from g >k to g k - 


'Hij 


1 if and only if H —j or i - j in G^ cg or * = j, 

0 otherwise. 


(24) 


The edge matrix, W uu = (W)jj, for dashed lines, is block-diagonal with K symmetric 
dk x dk edge matrices for covariance graphs of X k given X >k : 




1 if and only if i - j in G* e „ or i = j , 

0 otherwise. 


(25) 


For Watat, the last block is taken to be W vv = I vv because the full line edges of the 
concentration graph of X u are already captured by 1-L VV . 

For nodes u as a single response node and v its regressors, one gets e.g. 

Vu\v = In [HuJiuvl S uu \ v = hi[n~ u W UU (?CJ T ], (26) 

as the induced edge matrices for equation (l23lh Note that some induced edge matrices 
are denoted by using a close calligraphic equivalent to the parameters in the Gaussian 
case. These are then either edge matrices of a starting graph, or their submatrices, or 
they can be derived directly in terms of the matrix operator described next. All others 
get A f as notation. 
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Partial closure 

We now let the ordered node set N = (1,..., d) denote also the rows and columns 
of an edge matrix Ad. containing Vs of one type. Then, for an ordered partitioning as 
N = ( a,b ), the ‘partial closure’ operator, denoted by zer a A4, closes a-line paths in 
a corresponding graph with edge matrix Ad and finds structural zeros induced in a 
corresponding parameter matrix M of a Gaussian distribution. 

Definition 4 The partial closure operator is, with a = {1}, 

.. (1 v T \ fl v T \ 

M= [w m ) ' Zer «- M= U In[m + wv T ]J ’ 

and zer fl Ad, for t > 1 elements in a, may be thought of as applying the above operation 
t times, using repeatedly appropriate permutations of Ad. 

An off-diagonal i,j^kof zerpyAd contains an additional one compared to A4 if and 
only if Af ij = 0 and = 1, hence indicating the presence of a V in the graph. 

The operator preserves all ones of Ad and it closes paths with Vs which must be of the 
same type in the graph represented by Ad. 

Proposition 6 Wermuth, Wiedenbeck and Cox, m- Partial closure is commutative, 
cannot be undone and is exchangeable with taking submatrices. 

One may for instance get the edge matrix AT, that is obtain the transitive closure 
of a directed acyclic graph, and W _ of equation (fT3lh that is complete all connected 
components of an undirected graph, with N = {a, b} as 

zer fe zer a *4 = zer^^d = A , zer b zer a W = zerjyW = W~. 

By Definition 1 and Prop. O zer a A for G gai . remains unit-upper triangular in the start¬ 
ing order, zer tt W for G^ on remains symmetric and disconnected components, such as the 
graphs for conditional covariances of different joint responses in G^ cg , remain discon¬ 
nected. 

The edge matrices ('P U | B , <S UU |„) in equation (126|) . induced by G^ cg for f u \ v with a single 
joint response u, may be obtained with [zer,,4d ; y/v]it,,v- For a = (a, 7 ) and b = (/3, 5), the 
edge matrix components for f a \85 and as induced by G^ cg with f a \ b and /& are given 
by the subgraph of a U f3, just as the Gaussian parameters in equation (I2U1) are given 
by submatrices. 

Algorithms for finding the transitive closure in directed graphs, possibly containing 
cycles, started to be developed independently in the Russian, French and American 
computer science literature; for a recent survey see [92]. Algorithms for finding connected 
components for general graphs, [SB], are also still being developed, [ 6 TJj. 

One advantage of partial closure is that its properties justify stepwise procedures 
using just the Vs in a G^ eg . Another is that properties of this matrix operator prove 
some features of the regression graph transformations. 
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Edge matrices induced by G^ g 

The edge matrix of the a-line anterior graph of G^ eg , see Definition 2, arises for a any 
subset of N, b = N \ a and a reordering as N = (a, b) with: 


Knn = zeY a 'H N N- (27) 

This operation closes all full, a-line paths within v and for each i in u, it turns every 
a-line anterior j into a parent of i. Similarly, 

Vnn = zer&WjviV) (28) 

closes all dashed, 6 -line paths in the conditional covariance graphs of the responses. 

Thus, the two partial closure operations in equations (l27|h ([28]) close all of the fol¬ 
lowing four types of Vs, where o g denote nodes in a subset g of N: 

O a~^ jNi Oa jvi b) O a jvt iu Of) ju j 

and the types of induced edge are as specified in equations ([ 2 ]), (fT 2 D . (fT 6 D for o a , a node to 
be marginalized over, and o?,, a node to be conditioned on. These induced edges preserve 
the ordered split of the nodes, N = (u,v). The corresponding model may be interpreted 
as a covering model, one with fewer constraints than the reduced model specified by the 
generating G^ eg , [16]. 

Four types of V remain to be closed for consequences of G^ eg with f u \ v f v for f a \bfb- 

O a jai Oa ^~jb G OfeH Jfe, ib ^~Ob~i jb ■ 

To achieve this, V uu is combined with K vv to give Qnn- 

Quu = Vuu, Qvv = J^vv-I Quv = 0 ) Qvu = 0- (29) 

Then, these remaining Vs are closed with the following edge matrix products: 

In[7C ao Qaa/C£j gives for i a ^— k 0 ---l a — y;j a and i a ^k a - l a —^j a 

a complete covariance graph, 

ln[1C aa VabK-bb] gives for i a — k a - —jb a complete graph of response 

nodes {i a ,k a } and regressor nodes {h,jb}, 

In [Hj b Vbb'Hbb\ gives for b —ykb - lb—jb a complete concentration graph. 

This leads to the edge matrix components, Af a \b, A/" aa |b, of the graph for regressing 
X a on Xb and, Af bb ' a , for the concentration graph of Xb, as induced by G^ eg ; induced 
arrows point from regressor Xb to response X a . 

Proposition 7 Wermuth, HEJ Edge matrix components induced by G^ cg for N = (a, 6), 
by marginalizing over any a C N, conditioning on b = N \ a, are 

Ab aa \ b = In[7C aa Q aa 7Cj a ], 

A/" a |fe In[/C a fe T /C a a Vab/Cfeb], 

a f bba = in [ulVbbUbbl. 
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The zeros in these induced edge matrices represent also the structural zeros in S aa | & , 
n a | fe , n bba . As in Definition 1 for Ii = 1, these ij-zeros mean: 

iMj\b in Af aa \ b , iAj\b\j in Af a \ b , z_LL j|6 \ {i,j} in A f bb ' a . 


Example 1 For G* ax of Fig. [5] with edge matrix A. marginalizing with an order- 
respecting split, a = {1,2,3}, and conditioning on b — N \ a gives with Af a = 1C aa 
and K ab = In [A~ a A ab \ a direct generalization of equation (TDT) : 

= [A m (A m n V, lb = K ab , = In[^TAJ. 



Figure 5: Left: the generating parent graph, right: induced graph for one set of response 
nodes, a = {1, 2, 3}, and one set of regressor nodes b = {4, 5, 6}. 


An alternative to the edge matrix results is to use Corollary [I] to derive, separately for 
each missing edge in G ^ ai , the consequences of the new conditioning sets specified by 
N = (a, b ) for the regression graph of Fig. 0 on the right, which has only one joint 
response, the one of nodes 1,2,3. 

Example 2 A generating G^ g for determinants of the well-being of diabetic patients 
having a lower level of formal schooling, Y, is given in Fig. [b] left. The following de¬ 
scription of this graph attaches to it a plausible, substantive story. This uses statistical 
results, [IE], not given here. 


glucose control, Y O-*- 


knowledge £)<■ 
about diabetes, X 



W, duration 
of illness 
external fatalistic 
attribution 



Figure 6: Left: a generating G^ aT , right: the induced graph for regression of {Y,X) on 
(IF, Z); Y -<—Z induced by implicitly marginalizing over X in G^ ar to obtain the joint 
response (Y,X). 


Glucose control improves, the more a patient knows about diabetes and the longer 
ago diabetes was diagnosed. Thus, glucose control depends directly on the knowledge 
about the illness, X, and on the time since the illness was diagnosed, IF, hence Y rh A | W 
and Y rh !F|X. Knowledge, A", is better, the lower the external fatalistic attribution, 
Z, that is the less patients tend to think that their well-being depends mainly on their 
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physicians, so that X rtl Z. And, fatalistic attribution, Z, decreases with the time since 
diagnosis, W, so that Z rfi W. This well-fitting graph contains the direction-preserving 
path (Y,X, Z,W). This path, together with the type of involved dependences, suggests 
that intervening on the variables along it may improve the well-being of diabetic pa¬ 
tients. 

By constructing induced graphs, one can answer queries like: which additional depen¬ 
dences result from a given generating process by using another type of process for the 
same variables? This may for instance arise in empirical studies when researchers dis¬ 
agree on the ordering of the variables. In the example with X as primary and Y as 
secondary response, A "—W would result due to conditioning on Y in the sink V , 
(X,Y,W), in the starting graph, while for Y, X as a joint response in Fig. 6 on the 
right, the arrow Y-<—Z is added due to marginalizing over the inner node A" of the 
transition V , (Y,X, Z), in the starting graph 

Edge criteria for effects of separation in 

By Corollary 1 and by Af a \b representing the edge matrix induced by Gf cg for the bipartite 
graph of arrows when X a is regressed on Xt, submatrices of the edge matrices in Prop. [7] 
give also the structural zeros induced by G^ cg for the joint conditional distribution with 
density of f a/3 \ c = f a \y c f a \ c - 

A/" a\p.c ~ [■A/’ a\b]a,/3, A /”, aa \b = [ftfaa\b]a,on 

Proposition 8 Wermuth, A regression graph G^ cg with edges given by Definition 

1 implies a_lL/3|c if A/* a | ( g. c = 0 and it implies a rtl j3\c if J\T a \p_ c ^Q. 

Thus, the absence of ones in a matrix indicates directly a queried independence and 
the presence of ones shows where dependencies occur. Instead, with any path criterion, 
one has to study the properties of paths before a decision can be reached. This may 
get cumbersome in large graphs when one has to check for each collision V whether its 
collision node is within the anterior set of c. 

Properties of regression graphs 

A regression graph, Gf cgl shares the three properties in Prop. |T] of a joint Gaussian 
distribution generated over Gf cg . It is dependence-inducing and independences combine 
downwards and upwards, that is it satisfies singleton transitivity, intersection and com¬ 
position, in addition to the general properties of all probability distributions. 

Its composition and intersection property have been proven in general, ca. and were 
discussed above for just three variables. Singleton transitivity requires an additional 
independence involving node h, say, if the conditioning set for independence of i,j 
includes and excludes h. For i,j, h distinct nodes of N and c a subset of N \ {i,j, h} 

(iJLjjc and iJLj\hc) ==> (iALh\c or jJLh\c). (30) 

The equivalent statement ‘for (i rtl h\c and j rtl h\c), either iALj\c can hold or i _LL j\he but 
not both’, was proven with equations (fIU|) and (TITjl for two types of V in . The same 
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types of argument prove singleton transitivity of Gf cg . Recall that traceable regressions 
satisfy these same three properties, hence ‘mimic independence properties of a 
Gaussian distribution’ generated over G^ cg . 

For ‘set transitivity’ as defined in the literature, the single node h in equation (jHUl) 
is replaced by a subset of iV; disjoint of {i,j, c}. Set transitivity is for instance violated 
when both of the independence structures hold which are defined with the concentration 
and with the covariance graph in Fig. 0 

This may happen in Gaussian distributions, inum , but not for undirected graphs; 
see Corollary 2. More generally, since graphs induced by G t ^ g may be derived by partial 
closure and by adding products of binary matrices, contributions of several paths to a 
conditional dependence of any node pair i , j can never cancel out. 

h - j h - j 

I l 

i - k i - k 

Figure 7: Left: concentration graph with iALj\{h, k} and h-lLk\{i,j}; right: covariance 
graph with i _LL j and h ALk; connector for z, j is {h, k} in both. 


Proposition 9 Wermuth and Sadeghi, urn The structures captured and induced by 
G^ cg are like traceable regressions but having and inducing exclusively positive depen¬ 
dences. 

We show next how source, transition and sink Vs of G^ ai . in Fig. Q]and equation (J2j) 
generalize to source, transition and sink Us in Fig. [S] By remembering the path ends 
for the four zj-paths in Fig. [SI induced are either a dashed zj-line, an zj-arrow or a full 
zj-line; see equations ([2]), (Il2ji . (TT6l) for the involved, repeated closing of Vs: 



Figure 8: Types of U with undirected edges and arrows to or from i,j. The first two on 
the left: source U; the third: transition U; the fourth on the right: sink U. 

We now let z and j be again an uncoupled node pair of N. Sets <5^0 and c be 
disjoint subsets of N \ {i,j}- Then, 6 is called a ‘connector’ if G^ eg implies (z_IJ_j|<5c 
and z rh j\c) or (zJLjjc and z rh j\5c) and the inner nodes of undirected zj-paths exhaust 
the nodes of 5. With this definition, a previous claim of set transitivity of G^ cg , [MU, 
can be corrected as follows. 

Proposition 10 Wermuth and Sadeghi, [ 113j . Regression graphs are connector-transitive, 
compositional graphoids. 
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Equation (13U]I is changed into connector transitivity by replacing the single node h by 
a connector 5. Connector-transitivity extends singleton-transitivity. It concerns chordless 
cycles in undirected graphs; the simplest are in Fig. [71 Furthermore, it concerns Us with 
mixed edges inducing an undirected ij-e dge. These have two incoming or two outgoing 
arrows at i, j and an undirected path via the nodes of S. It is the exchangeability property 
of partial closure which permits one to argue by just using subgraphs. 

Next, we describe the operator which corresponds closely to partial closure since 
it transforms parameter matrices for Gaussian distributions generated over G^ cg in a 
similar way as partial closure modifies edge matrices. 


Partial inversion 


Let N = (1 ,,d) denote the rows and columns of a real-valued matrix M having 
invertible leading principal submatrices, where M connects real-valued vectors x and y 
as Mx — y. The ‘partial inversion' operator, denoted by inv a lVf, for ‘a’ any subset 
of N and an ordering as N — (a, b ), exchanges argument and image relating to a, [ 114] . 
[115]. that is 

M ( 2) = ( vi ) is tumed into: hw “ M () = ( 2 ) ■■ (31) 

Applied for instance to rows a of £= £, two correlated sets of equations turn 
directly into two sets of orthogonalized equations; see equation d2T]) . 


Definition 5 The partial inversion operator is, with a = {1}, 

”*): invji} M = (A -' T /«. 

ml x 1 \w/s m — wv/s 


M = 


s 

w 


and mv a M for t > 1 elements in a, may be thought of as applying the above operation 
t times, by using repeatedly appropriate permutations of M. 


Partial inversion, m, generalizes the sweep operator, ra. ra, and other methods for 
Gaussian elimination, [HB], to non-symmetric matrices. The matrix m—wv T /s is a Schnr 
complement, ra- A small modification of the sweep operator leads to the ‘symmetric 
difference’, so that an action on a, say, is undone by using this same operator again 
on a. 


Proposition 11 Wermnth, Wiedenbeck and Cox, m- Partial inversion is commuta¬ 
tive, can be undone and is exchangeable with taking submatrices. 

In particular, the operator gives iirvyS = —inv a S _1 and the corresponding three 
Gaussian parameter matrices in equation (TT8jh The Schur complements involved in the 
two operations, irrvyS and inv a S -1 , are matrix forms of the recursion relations in equa¬ 
tion (EJ). A matrix form of the recursion relation for regression coefficients arises by 
partial inversion on v in the matrix example to equation (j 22 j) . 

By starting from a general regression graph model in equation f)22]) . the parameter 
matrices H NN and Wnn and N = (u,v) are given. Parameter transformations that are 
analogous to those of the edge matrices of Prop. [7] have been derived using the partial 
inversion operator and sums of matrix products, [100]. In contrast to partial closure, 
partial inversion may lead to negative elements in the induced matrices and therefore 
permit path cancellations. 
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Some special aspects 

For many regression graph models, the parameters in the regressions of Xk given the 
past X > k can be estimated by using standard methods, [58], [95], [2], but some are based 
on special multivariate models, [35], [53], [72], [2B| or on special features of the data, [27], 
0 U, hdj. Possible shortcomings have been identified for some estimation methods, m, 
and for some models, ra, [60]. New estimation results are needed for joint responses of 
both categorical and quantitative components; exceptions are CG-regressions, 021.®- 

Features of special models may give unexpected insights and often lead to simplified 
properties. For instance, parent graphs without any transition V, shown in Fig. [T] are 
lattice conditional independence models, [6]. For these models, G* g coincides with the 
ancestor graph. Hence, the separating paths of Prop. Q] apply directly to G^ eg . Parent 
graphs of exclusively source Vs are labelled trees, m These have exactly one path 
connecting each node pair and a_lL/3|c if every path between a and j3 intersects c. 

Parent graphs without any sink Vs are said to be decomposable. By Prop. [51 they 
are Markov equivalent to concentration graphs in the same node and edge set. Finding 
well-fitting models for them may often be based on small subsets of variables and, for 
judging their goodness of fit, re estimation of parameters may not be needed, EZ|, ES|. 
Complex properties of estimates, simplify for decomposable models as well, EH, 02) . 
Strong analogies to Gaussian models result for binary variables with special types of 
graph, [159] . especially when their distributions are jointly symmetric, mnumi. 

For observational studies, it is of concern whether dependences can be well esti¬ 
mated when some variables are unobserved. As a first step, one needs to know, when 
the parameters of such models can be identified. Considerable progress has been made 
regarding this in the last years; see [80], [30], [86 1, [TJ. 

Some regression graph models for symmetric binary variables 

We now consider special models for symmetric binary variables, which compare most 
closely to Gaussian distributions generated over some regression graph for variables 
standardized to have mean zero and unit variance. The purpose is to illustrate for some 
binary distributions generated over simple Markov equivalent graphs that the corres¬ 
ponding models are also ‘parameter equivalent’, that is there is a one-to-one relation 
between the parameters of two different models. This assures that the same transforma¬ 
tion, which relates the parameters of the models, applies also to the maximum-likelihood 
estimates, ®. an important property that appears not to be shared by any of the more 
recently developed estimation methods. 

The binary variables have levels —1,1 and equal probabilities, |, allocated to each 
of its two levels. A consequence is that they have mean zero and unit variance by 
definition. Their covariance matrix S coincides therefore with their correlation matrix; 
it has elements = pij and a a — 1. 

Induced marginal and partial correlations are just as for Gaussian distributions gen¬ 
erated over the same graph, but a zero partial correlation in an induced concentration 
graph need not correspond to an independence statement; for an example see mni 
Appendix C, and see also [39] . 

For an ordered node set N = (1,2, 3,4), we denote four symmetric binary variables 
by A, B,C, D, their respective levels by and abbreviate joint and conditional 
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probabilities for instance as 

/l234 = ^tjkF 0 = Pr (^ — i, B — j,C — k, D — l), TT^f 1D = 7 T ijH / Kijkl- 


Their joint distributions are generated over parent graphs with just main effects, for 
the complete parent graph as: 


A\BCD 

n i\jkl 


7r 


B\CD 

j\kl 


C\D 

k\l 


7T 


D 

l 


|(1 + Vi 2 ij + Via ik + rjuil) 

|(l+^23 jk + Tfrijl) 

|(1 + Vm kl) 

1 

2 ’ 


(32) 


with rf s resulting from E just like linear least-squares regression coefficients: 


??34 — P34 

( r /23 ' 724 ) = (d23 P 24 ) ^|^2}{ > 2} 

(+12 hi3 P 14 ) = (Pl2 Pl3 Pl 4 )£ { > 1}{>1} , 

where the inverse of, say, a submatrix Xf aa is written as 

This form of the 77 -parameters generalizes directly to d > 4 variables and stems from 
the close connection for binary variables between probabilities and expectations. For 
instance, by using equation ( 1 +Zll 


E(B\C = k, D = l) = 7723 /c + 7724 /, 


and correlation coefficients are cross-sum differences in probabilities, 

T?(nr>\ /_CD 1 „CD \ /—CD , rjr CD\ r\( CD ar CD\ 
E[CD) = (7T n +7T_ 1 _ 1 )-(7r_ 11 + 7r 1 _ 1 ) = 2(7r 11 -7r_ n ) 


uni, such as: 

= P34- 


The second equality holds since in the generated distributions all odd order moments 
vanish so that there is also joint symmetry, [25], Appendix C. In this case, the prob¬ 
ability of any level combination of these binary variables equals the probability of the 
level combination having each sign switched. 

For binary variables in general, logit regressions, (2], are best suited to model condi¬ 
tional independence constraints. A logit regression is already close to a linear regression 
whenever the extreme events are not rare, but instead are more probable than say 0 . 1 , 
B3t It is in the special case of symmetric binary variables that the vanishing of linear 
regression coefficients in equation (f+ 2 f) coincides with the vanishing of logit regression co¬ 
efficients in corresponding sequences of main-effect logit regressions. Thus, for instance, 


1 _LL4|{2, 3} (7744 = 0), 2_1L3|4 (7723 = 0), 3X4 (7734 = 0). 

These binary distributions, generated over a given Gp ar , have the edge matrix A of 
equation (jSJ), the same triangular decompositions of E -1 and E as in equation (|7|), and 
the same induced covariance and concentration graphs as in equation (191) , even though 
A does not contain the conditional variances but, for d > 2, their expected values with 
respect to the past variables. We now turn to some Markov-equivalent regression graphs 
and models. 


Example 3 The following graph captures mutual conditional independence of 

A, B , C given D for (1, 2, 3,4) = (A, B , C, D ) 


25 




1 


2 3 


MX 

4 


For any type of distribution generated over this graph, the edge matrix A is the binary 
matrix defined by equation (JSJ) and the generated density is 


/1234 — /ip/214/314/4 44 (1 _LL 2 _LL 3)|4 . 


Here, the four binary symmetric variables have the constraints 0 = 7712 = 7743 = 7723 
in equations (l 32 j) . The triangular decomposition of X, that is the matrix pair (A^ 1 , A), 
leads to the special form of the correlation matrix with 


A = 


/I 0 0 
1 0 
1 

v° 


-pu\ 
~ P24 

‘P34 

1 ) 


s = 


(1 P14P24 P14 P34 Pl4\ 
1 P24P34 p24 

1 P34 

V- • !/ 


and S ss = 1 — p 2 sA for s = 1 , 2 , 3 , and 644 = 1 . The induced correlations, corresponding 
to the three missing edges of the graph, are as specified for the outer nodes of a source 
V in equation (J 32 ]). Here every ancestor is a parent, hence A~ = A, and equation ([ 9 j) 
gives a complete induced covariance graph and an induced concentration graph with no 
additional edge. 

Because the given contains no collision V, it is Markov equivalent to G^ on with 
the same node and edge set. The joint probabilities obtained from equations (l 32 j) show 
directly that the more important parameter equivalence holds in addition. Often, Markov 
equivalence implies parameter equivalence whenever a single parameter is attached to 
each edge present in G^ ir . 


Example 4 This example is a Markov chain graph, which is a parent graph consisting 
of a single direction-preserving path of arrows, here: 

1 - 4 — 2 h— 3 h —4 , 

where each response node remembers from its past only the most recent node. For any 
type of distribution generated over this graph, the edge matrix A is the binary matrix 
defined by equation (jHJ) and the generated density is 

/1234 = /112/213/314/4 44 (1 -LL{ 3 , 4}12 and 2 _IL 4 | 3 ). 

For four binary symmetric variables and constraints 0 = 77 13 = 7734 = 7724 in equations 
f| 32 p . the triangular decomposition of X, the matrix pair (AX, A) gives the special form 
of the correlation matrix with 



(\ — P12 

0 

0 ^ 


^1 P12 

P12 P23 

Pl2 P23 P34^ 

A = 

1 

~P23 

0 

, s = 

. 1 

P23 

P23 P34 



1 

~P34 



1 

P34 


\o 


1 y 


V • 


1 / 


and 5 SS = 1 — S+1 for s = 1 , 2,3 and S44 = 1. The correlation induced for each 

missing edge in equals the product of the correlations along the path connecting 
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the uncoupled node pair. Here, every node in the past of i is an ancestor of i, hence the 
ancestor graph with edge matrix AT is complete. Consequently, the induced covariance 
graph is also complete. 

As in Example 3, equation (J9]) gives an induced concentration graph with no addi¬ 
tional edge. Here, G^ yAr is Markov equivalent to a G y(m which is a concentration chain 
in nodes (1, 2, 3,4): 

1-2-3-4, 

where each edge present means i rti jjA”\{z, j}. Also as in Example 1, there is parameter 
equivalence obtained from equation ((22]) to the parameters in the joint distribution: 

ir?jki CD = ^{(! + Pi2U')(l + P23jk)(l + p M kl). 


Example 5 This last example is quite different from the previous one. It is a covariance 
chain in nodes (1, 2, 3,4): 

1—-2—-3—-4, 

where each zj-edge present represents in general i fh j. For the symmetric binary vari¬ 
ables, the dependence is captured by the marginal correlation coefficient, pij ^ 0. The 
simplifying independences are in A and in X but there are none for the joint distri¬ 
bution generated over a parent graph with node ordering (1,2, 3,4). Accordingly, the 
factorizations and the independence structure are 

(/124 = /12/4 and /134 = / 1 / 34 ) ({1, 2} _LL4 and 1 _LL {3,4}) 


Because dashed-line Vs are edge-inducing by conditioning, induced regression coeffi¬ 
cients appear in A, where A T A _1 A = X -1 , 


(33) 



(1 -T] 12 P12P23 -P12P23P34\ 


/I P 12 0 0\ 

A = 

1 — P23 P23 P34 

X = 

1 P 23 0 


1 -P34 


1 P34 


\o 1 y 


V • • 


Since there are no vanishing regression coefficients, there are also no independences of 
the type iALj\N \ hence the induced concentration graph is complete. 

There can be sign changes for induced coefficients, for instance 024 = — 023034 - Sep¬ 
arate estimation of the parameters in the regressions for responses i < 3 is not feasible 
for this model, since some of the regression coefficients depend on coefficients in the past 
of node i. 

However by Prop. [5] the given covariance chain is Markov equivalent to the following 
regression graph 

1— t-2 -3H—4 , 


which represents, for Gaussian distributions, the simplest type of Zellner % nza. 12a, 
seemingly unrelated regression. After reordering to (2, 3,1,4), the covariance matrix here 
becomes X' while partial inversion on the regressors 1,4 gives the parameters for the 
joint response regression of Y a on Y b , where a = {2,3} on b = {1,4} 


X' = 


A P23 Pi2 0 \ 

1 0 p 34 

1 0 

V • • 1 J 


inv3 4X' = 


A — P12 P23 P12 0 \ 

1 - Pm 0 p 34 

~ ~ 1 0 

} ~ ~ 1 J 
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The ~ notation denotes entries that are symmetric np to the sign. This parametrization 
is equivalent to requesting 

2 iti 3|{1,4}, 2 rh 1|4, 2_LL4|1, 3_1L1|4, 3 rh 4|1, 1X4. 

It leads to joint probabilities which become, see [ 110] Appendix A: 

n fkii AD = + Pviij + P 2 sjk + p 3 ikl + p!2p34ijkl) , 

so that there is parameter equivalence in spite of a four-factor interaction. 

For N = (2, 3,1,4), the three types of edge sets, E__ , _, E _ are captured by 
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Here, e = In(w + H + U t ) is the generating edge matrix of S, due to Markov 
equivalence. With it, the edge matrix of the induced concentration graph becomes, by 
equation (TT31) . S NN = In(.L _1 ), where L = 51 — £. This is just one way to see that here 
the induced concentration graph is complete. 

These last three examples show that independences may mean simple zero constraints 
on parameters of one model but may appear as complex constraints, even in parameter 
equivalent models. It is therefore in general rarely useful to restrict model search and 
data analysis to one particular class of models. Strong prior knowledge would be an 
exception. Even then, using Markov- and parameter-equivalence may aid in finding 
alternative interpretations and alternative fitting algorithms. 

If the motivation for designing an empirical study are causal hypotheses, then undi¬ 
rected graphical models alone are typically of little interest. But similarly, directed 
acyclic graph models are of little help when one expects that an intervention will lead 
to changes in several connected responses at the same time. For instance, when effects 
of a drug to reduce blood pressure are to be studied, this intervention will affect systolic 
and diastolic blood pressure simultaneously and not one before the other. 

Discussion 

It took nearly 40 years of research until the present form of the regression graph, G^ cg , 
was defined and its properties and consequences could be studied. The graph represents 
ordered sequences of joint response regressions. Responses may depend on all or on only 
some of the variables in their past. The graph contains three types of edge, one undirected 
type for dependences among responses, another undirected type for dependences among 
context variables and directed edges pointing to a response from nodes in its past. 
Conditional dependences show in edges present in G^ eg . These dependences simplify 
with more missing edges, that is the more conditional independence constraints there 
are. 

To make the graphs useful tools for tracing developmental pathways and for pre¬ 
dicting structure in alternative models, the generated distributions have to mimic some 
properties of joint Gaussian distributions, see Prop. CD If in G^ cg , independences did 








not combine downwards and upwards, that is if the intersection and the composition 
properties were not satisfied, it would be impossible to infer mutual independence of 
disconnected subgraphs. Then, the graphical representations would be nearly useless. 

If regression graphs were not, in addition, singleton-transitive, then they would not 
even well represent Gaussian distributions which have this property and are the simplest 
and most studied types of joint distribution. Also, edges present in induced graphs would 
not point to non-vanishing conditional dependences in traceable regressions. But this is 
a prerequisite for useful tracings of pathways of development in the graphs. 

Connector-set transitivity will illuminate the distinction between structural inde¬ 
pendences and those that may result due to special parametric constellations. The dis¬ 
tinction between the two types reflects a long-standing practice in empirical research. 
Whenever a result has been replicated in several studies under essentially the same 
conditions, one typically still wants to establish it under modified conditions. 

Even when all edges present in a graph correspond to positive dependences, negative 
linear dependences are induced by closing collision Vs; see equation (JHj) and equation 
mil . Some first results for preserving positive dependences have been obtained, [52 ] , 
[ 4U] , others are expected for totally positive distributions generated over G^ on and for 
decomposable regression graphs, those without any collision V. 

Among the open theoretical questions are the following: Can necessary and sufficient 
conditions be derived for the properties of traceable regressions, such as those for the 
intersection property, [78]? For this, can methods of algebraic statistics also be helpful, 
such as those for binary tree models, H221? How will independence structures and their 
properties change when graphs are no longer finite, [59]? When may models with fewer 
independence constraints, that is with more edges in the graph, be safely used as covering 
models, [16], for simpler estimation and useful interpretation? 

At least equally important are further direct applications of traceable regressions; 
for a summary of the related tasks and links to detailed reports on finding well-fitting 
models in different research contexts see [105]. With traceable regressions, it has become 
feasible, for the first time, to derive the structural consequences of (1) ignoring some of 
the variables, of (2) selecting subpopulations via fixed levels of some other variables or 
of (3) changing the order in which the variables might get generated. With the currently 
used methods for combining results from empirical studies, called ‘meta-analyses’, 
such effects are not taken care off. Therefore, the, most important future applications of 
these models will aim at the best possible integration of knowledge from related studies. 
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